Introdcution
Strings are of course a series of single characters. In one sense you can think of these as an array of characters. Using the MID$ function we can index into that array of characters. Remember from the first Strings article, the MID$ function takes three arguments:
MID$(string$, index, n)
string$ = the source string
index = the starting character in the string to begin the extraction
n = the number of characters to extract
Treating the string like an array, we can easily spin through a whole string, printing each character individually using a FOR-NEXT loop and the MID$ function. That works like this:
a$ = "My new red wagon is very shiny!"
for x = 1 to len(a$)
print mid$(a$,x,1)
next x
The trick to making the program see every character in the string is to know how long the string is. As you may recall from earlier discussions, this is done using the LEN function. We embedded the LEN function directly into our FOR-NEXT loop in the code above. This causes the loop to run from the first character (character number 1) to the last character in the string. This way we do not have to know the length of the string to spin through the whole string.
In a recent project I needed to remove all the double quotes from strings that were imported into my program from a file. I did the file import one line at a time using the LINE INPUT statement, so I had to evaluate many strings of varying lengths. The code we wrote above can be easily adapted to accomplish this need.
a$ = "My new " + chr$(34) + "red" + chr$(34) + " wagon is very shiny!"
print "Original: ";a$
a$ = removeQuotes$(a$)
print "New: ";a$
function removeQuotes$(src$)
for x = 1 to len(src$)
if mid$(src$,x,1) <> chr$(34) then out$ = out$ + mid$(src$,x,1)
next x
removeQuotes$ = out$
end function
Notice in the function that if the character being examined (using the MID$ function) is NOT a quote (ascii 34 as you will recall from part 2 of this series), then we add the character to a buffer that contains our return string. This causes the function to effectively skip the quote characters when encountered. Run the code and check it out!
If you want to replace the quote character with another character, it would require a slightly more complex IF-THEN statement. Here is an example that replaces the comma character with a space character:
function replaceComma$(src$)
for x = 1 to len(src$)
if mid$(src$,x,1) = "," then
out$ = out$ + " "
else
out$ = out$ + mid$(src$,x,1)
end if
next x
replaceComma$ = out$
end function
String parsing is one of the most complex things most people will do programming. Every parsing need is slightly different from the last one. I recently completed a little project that had to manage its own settings file. I created some simple rules for the settings file, but since it was likely the settings file would be (and should be) edited by an outside party, I had to carefully parse it as it was read.
The rules were simple:
The program to process this simple set of specifications is pretty complex. Consider what you must do to process this:
Lets dig into this:
First we will need a working dat file to play with and test. Here is my sample dat file:
## This is a sample settings file
## These are comments - blank lines are skipped
[workingdir] This is the working directory
c:\lb4
[name] Name of the program
Cool Stuff Demo
[end]
You can copy this to a working directory, or you can get the actual dat file from the Newsletter archive for this issue.
First open the file (we called it settings.dat). We will set a flag that tells us we are not expecting a setting value. We will also check to make sure there is something in the dat file:
open "settings.dat" for input as #1
sval = 0
if eof(#1) < 0 then
'there is nothing in the file - just quit
end
end if
Now we will use a simple loop to read the file using the LINE INPUT function:
'set up a loop to read the file
do
line input #1, a$
print a$
loop while eof(#1) = 0
close #1
end
The simple skeleton reads the file, one line at a time, prints that line out to the console and reads the next. This proves we can get the input text. Now we need to start parsing it. Lets handle comments and blank lines:
We are going to slowly replace the body of this loop. We are handling blank lines and comment lines by ignoring them. That is pretty easy to do. Examine this code that is intended to replace the print statement in the loop above:
lenght = len(a$)
if lenght > 0 then
'if this is not a blank line, process it...
if left$(a$,2) <> "##" then
'this is not a comment - process it...
'print what is left for now
print a$
end if
end if
If you put it all together and run it against the test file you will get the following output:
[workingdir] This is the working directory
c:\lb4
[name] Name of the program
Cool Stuff Demo
[end]
We need to handle the END condition. Do that by inserting the following code one line before the print a$ statement:
if lower$(a$) = "[end]" then exit do
This leaves us with the real parsing job. It is really not overly complex for our example. We know we are either looking for a setting or a setting value. Our rules require a setting precede the value. We set a flag called "sval" to zero. When this is zero we are looking for a setting. If it is a one we are looking for a setting value. If they come out of order then we will say there was an error.
We know that settings always start with a bracket. In the real program I validated my setting values against a master table of valid settings. In this case we will not be adding that level of complexity. We will say that if sval = 0 then we are looking for the first character to be a "[" - anything else is an error.
The loop is getting pretty complex. Here is what it looks like incorporating this logic:
'set up a loop to read the file
do
line input #1, a$
lenght = len(a$)
if lenght > 0 then
'if this is not a blank line, process it...
if left$(a$,2) <> "##" then
'this is not a comment - process it...
'print what is left for now
if lower$(a$) = "[end]" then exit do
if sval = 0 then
if left$(a$,1) <> "[" then
print "there was an error - setting expected"
end
else
'parse the setting from the text string
print "setting: ";a$
sval = 1
end if
end if
end if
end if
loop while eof(#1) = 0
Note that if sval = 0 and you have a valid setting, then we print the setting (currently) and set sval equal to 1. We will need to rest it to keep reading more settings, but that comes later.
The first mater at hand is parsing the setting string once we know it is valid. We can apply some of the techniques we have been discussing. We need to locate the closing bracket (if it is not there we have an error), then extract the setting from between them. We can locate the closing bracket with the INSTR function.
Use:
pos = INSTR(a$,"]",1)
Using that position stored in pos we can calculate the part of the string to extract to get the setting name. Here is that portion of the parsing loop:
if sval = 0 then
if left$(a$,1) <> "[" then
print "there was an error - setting expected"
end
else
'parse the setting from the text string
pos = INSTR(a$,"]",1)
if pos = 0 then
print "error parsing setting - improper format"
end
else
'we can now extract the setting name
setting$ = mid$(a$,2,pos-2)
print "setting: ";setting$
sval = 1
end if
end if
end if
What remains is to handle the setting value. Since it can be any string we will simply echo it to the console. We will also change the sval flag back to a zero so we can process another pair of settings.
It only took three additional lines of code. Here is the entire parsing program:
open "settings.dat" for input as #1
sval = 0
if eof(#1) < 0 then
'there is nothing in the file - just quit
end
end if
'set up a loop to read the file
do
line input #1, a$
lenght = len(a$)
if lenght > 0 then
'if this is not a blank line, process it...
if left$(a$,2) <> "##" then
'this is not a comment - process it...
'print what is left for now
if lower$(a$) = "[end]" then exit do
if sval = 0 then
if left$(a$,1) <> "[" then
print "there was an error - setting expected"
end
else
'parse the setting from the text string
pos = INSTR(a$,"]",1)
if pos = 0 then
print "error parsing setting - improper format"
end
else
'we can now extract the setting name
setting$ = mid$(a$,2,pos-2)
print "setting name : ";setting$
sval = 1
end if
end if
else
print "setting value: ";a$
sval = 0
end if
end if
end if
loop while eof(#1) = 0
close #1
end
Notice how the program, in spite of its complexity, does not use a single GOTO. This is structured code. With all the if-then statements it can make reading the code a challenge at times, but it is clean and easy to maintain.
These are the many challenges of working with strings. Parsing can be a tricky but the main hurdle is to really understand your data. If you know it well and what you want to do with it, you can parse it.
There are more articles on parsing in the newsletter. I encourage you to check them out. This concludes the working with strings series.
Thanks - Brad